Hevc video encoder and decoder for multi-core

ABSTRACT

The disclosure provides a video encoder. The video encoder receives a frame and divides the frame into a plurality of tiles. The video encoder includes a plurality of video processing engines communicatively coupled with each other. Each video processing engine receives a tile of the plurality of tiles. A height of each tile is equal to a height of the frame and each tile comprises a plurality of rows. The plurality of video processing engines includes a first and a second video processing engine. The second video processing engine being initiated after the first video processing engines processes M rows of the plurality of rows of the tile, where M is an integer.

CROSS REFERENCES TO RELATED APPLICATIONS

This application claims priority from India provisional patentapplication No. 2795/CHE/2013 filed on Jun. 26, 2013, which is herebyincorporated by reference in its entirety.

TECHNICAL FIELD

Embodiments of the disclosure relate generally to video coding and moreparticularly to video encoding and decoding in multiple video hardwareengines or in multi-core processors.

BACKGROUND

High Efficiency Video Coding (HEVC) is a latest video compressionstandard, successor to H.264/MPEG-4 AVC (Advanced Video Coding), jointlydeveloped by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-TVideo Coding Experts Group (VCEG) as ISO/IEC 23008-2 MPEG-H Part 2 andITU-T H.265.

A video input signal has multiple frames. HEVC divides a frame intorectangular blocks or LCUs (largest coding units) or macro-blocks of16×16, 32×32 or 64×64. An optimal size of the LCU is selected based onthe video content. HEVC provides for video frame division into multipletiles and slices to enable parallel processing. In this scheme,discontinuities can occur in a filtered video signal at the LCUboundaries which are known as blocking artifacts. The blocking artifactscan, for instance, arise due to different intra predictions of theblocks, quantization effects and motion compensation. Loop filters areused in the HEVC encoder/decoder in order to combat blocking artifacts.

HEVC promises half bit-rate compared to current de-facto video standardnamely H.264 at a similar video quality and expected to be deployed inwide variety of video applications ranging from cell phones, broadcast,set-top box, video conferencing, video surveillance, automotive etc.HEVC is enabling industry in transitioning to 4K (ultra high-definition(HD)) resolutions due to better compression efficiency and transparentquality. The performance requirement for HEVC video solution can varywidely based on application area. This poses a new challenge toarchitects in designing HEVC hardware and/or software solution.

An approach of designing a single monolithic engine for ultra-HDresolution results in a complex design of hardware and software. Also,the single monolithic engine is non-optimal solution for lowerresolution video for example HD (high definition) or D1 (standarddefinition).

An alternative approach for performance up-scaling is using multiplecopies of video hardware engines and/or processor cores. This solutionhas issues in partitioning of frames across these multiple cores due toloop filter dependencies across slice and tiles.

The prior approaches of handling the loop filter dependencies hadseveral drawbacks. A first approach is to disable loop filtering. Thisapproach results in degrading the quality of the video at slice/tileboundaries. A second approach is to enable loop filtering and control arate of encoding at the boundaries of the slices/tiles. The controlledrate of video encoding in this approach degrades video quality at otherportions of the frame in addition to the boundaries of the slices/tiles.

A third approach is to provide multiple video processing engines andeach engine processes a separate frame. This approach results in latencyof frames and hence is not efficient for application such as videoconferencing, video surveillance and gaming etc. A fourth approach is touse multiple video processing engines for processing of a video and aseparate loop filter. The multiple video processing engines performfunction such as motion estimation, transform and quantization. Afterthese processing operations, the separate loop filter performs loopfiltering. This approach increase the overhead of the system since anadditional memory bandwidth is required for input and output of theseparate loop filter and also it increases the processing cycles usedfor video encoding/decoding.

SUMMARY

This Summary is provided to comply with 37 C.F.R. §1.73, requiring asummary of the invention briefly indicating the nature and substance ofthe invention. It is submitted with the understanding that it will notbe used to interpret or limit the scope or meaning of the claims.

An embodiment provides a video encoder. The video encoder receives aframe and divides the frame into a plurality of tiles. The video encoderincludes a plurality of video processing engines communicatively coupledwith each other. Each video processing engine receives a tile of theplurality of tiles. A height of each tile is equal to a height of theframe and each tile comprises a plurality of rows. The plurality ofvideo processing engines includes a first and a second video processingengine. The second video processing engine being initiated after thefirst video processing engines processes M rows of the plurality of rowsof the tile, where M is an integer.

Another embodiment provides a video decoder. The video decoder receivesa compressed bit-stream corresponding to a frame. The frame includes aplurality of tiles. The video decoder includes a plurality of videoprocessing engines communicatively coupled with each other. Each videoprocessing engine receives a compressed bit-stream corresponding to atile of the plurality of tiles. Each tile comprises a plurality of rows.A height of each tile is equal to a height of the frame and a width ofeach tile is equal to a width of the frame divided by a number of videoprocessing engines in the video decoder. The plurality of videoprocessing engines includes a first and a second video processingengine. The second video processing engine is initiated after the firstvideo processing engines processes compressed bit-stream correspondingto M rows of the plurality of rows of the tile, where M is an integer.

Other aspects and example embodiments are provided in the Drawings andthe Detailed Description that follows.

BRIEF DESCRIPTION OF THE VIEWS OF DRAWINGS

FIG. 1 illustrates a block diagram of a video encoder, according to anexample embodiment;

FIG. 2 illustrates a frame received in a video encoder, according to anembodiment;

FIG. 3 illustrates a timing diagram of a video encoder, according to anembodiment;

FIG. 4 illustrates a flowchart of a method of video encoding, accordingto an embodiment;

FIG. 5 illustrates a computing device, according to an embodiment; and

FIG. 6 is an example environment in which various aspect of the presentdisclosure may be implemented.

DETAILED DESCRIPTION OF THE EMBODIMENTS

FIG. 1 illustrates a block diagram of a video encoder 100, according toan example embodiment. The video encoder 100 includes a plurality ofvideo processing engines 101. The plurality of video processing engines101 includes a first video processing engine 102, a second videoprocessing engine 104, a third video processing engine 106, a fourthvideo processing engine 108 and an N^(th) video processing engine 110.It is to be noted that the video processing engines 102, 104, 106, 108and 110 are according to an example embodiment and the video encoder 100includes one or more video processing engines.

A video processing engine, in one example, is a hard wired processor orASIC (application specific integrated circuit). In another example, thevideo processing engine is a programmable circuitry that can beconfigured to perform multiple functions. In an additional example, thevideo processing engine is a software implemented on a processingdevice. In yet another example, the video processing engine is acombination of a hard wired processor and a software implemented on aprocessing device. The processing device can be, for example, aCISC-type (Complex Instruction Set Computer) CPU, RISC-type CPU (ReducedInstruction Set Computer), or a digital signal processor (DSP). In oneversion, the video processing engine is an HEVC video hardware enginethat includes blocks such as (but not limited to) motion estimationblock, motion compensation block, quantization and transform block andloop filter block.

Each video processing engine of the plurality of video processingengines 101 is communicatively coupled with each other. In one version,each video processing engine communicates with the other videoprocessing engine through a direct path and/or indirect path. Forexample, the first video processing engine 102 communicates with thesecond video processing engine 104 through a direct path. In anotherexample, the first video processing engine 102 communicates with thethird video processing engine 106 through an indirect path which isthrough the second video processing engine 104. In another version, theplurality of video processing engines 101 communicates through acontroller. The controller is on-chip or off-chip depending onrequirement of the video encoder 100. In an additional version, theplurality of video processing engines 101 communicates through a messagenetwork.

Each video processing engine includes a loop filter and each loop filterincludes a work memory. For example, the first video processing engine102 includes a loop filter 112 and a work memory 122 and the secondvideo processing engine 104 includes a loop filter 114 and a work memory124. Similarly, the third video processing engine 106 includes a loopfilter 116 and a work memory 126, the fourth video processing engine 108includes a loop filter 118 and a work memory 128, and the N^(th) videoprocessing engine 110 includes a loop filter 120 and a work memory 130.The work memory stores a set of parameters and pixel dependencies acrossLCUs which are required during loop filtering operation. In oneembodiment, a video processing engine includes one or more loop filtersand each loop filter includes one or more work memory. In anotherembodiment, each of the video processing engine is coupled to a commonloop filter that includes multiple memories and each memory is dedicatedto a video processing engine. It is noted that the video encoder 100 asillustrated in FIG. 1 is one of the many ways of implementing videoencoder 100 and variations, and alternative constructions are apparentand well within the spirit and scope of the disclosure.

The video encoder 100 includes a shared memory 140. In an example, theplurality of video processing engines 101 communicates through theshared memory 140. In another example, the plurality of video processingengines 101 exchange data through the shared memory 140. In one version,the shared memory 140 is on-chip as the video encoder 100. In anotherversion, the shared memory 140 is external to the video encoder 100. Theshared memory 140 can be a memory such as (but not limited to) DDR(double data rate) memory, RAM (random access memory), flash memory, ordisk storage. A work memory in each video processing engine is coupledto the shared memory 140. For example, work memory 122, 124, 126, 128and 130 are coupled to the shared memory 140. The video encoder 100 mayinclude one or more additional components known to those skilled in therelevant art and are not discussed here for simplicity of thedescription.

The video encoder 100 receives a video having a plurality of frames. Thevideo encoder 100 divides each frame of the plurality of frames into aplurality of tiles. Thus, a frame has a plurality of tiles, and a heightof each tile of the plurality of tiles is equal to a height of theframe. A width of the tile is equal to a width of the frame divided by anumber of video processing engines in the video encoder 100. Forexample, if the width of the frame is W and the video encoder 100 has 4video processing engines; the width of a tile will be equal to W/4.Hence, the frame is divided into 4 tiles. In one example, a frameincludes tiles of different widths such as width of one tile in theframe is W/4 and a width of another tile in the same frame is W/2. Inanother example, a frame includes tiles of non-uniform width such that asum of the widths of the tiles is equal to the width of the frame. Eachtile includes a plurality of rows. Each row of the plurality of rowsincludes a plurality of LCUs (largest coding units) and each LCUincludes a plurality of pixels. Each tile is allocated to a videoprocessing engine. In an example, a first tile is allocated to a firstvideo processing engine 102 and a second tile is allocated to a secondvideo processing engine 104. The first tile and the second tile areadjacent tiles of the plurality of tiles.

The first video processing engine 102 processes M rows of the firsttile, where M is an integer. The processing of M rows generates a set ofparameters and partially filtered pixels corresponding to a set of LCUsin the M rows of the first tile. In one example, the set of LCUsincludes a last LCU in each of the M rows. The last LCU in each of the Mrows share a boundary with an LCU in the second tile. The set ofparameters include (but not limited to), motion vector information, SAO(sample adaptive offset), quantization parameters and coding unitparameters. The set of parameters and the partially filtered pixels arestored in the work memory 122. The work memory 122 is associated withthe first video processing engine 102.

The set of parameters and the partially filtered pixels are provided tothe shared memory 140 from the work memory 122. In one example, thefirst video processing engine 102 stores the set of parameters and thepartially filtered pixels in the shared memory 140. In another example,a controller associated with the video encoder 100 stores the set ofparameters and the partially filtered pixels in the shared memory 140.In an additional example, a sharing of the set of parameters and thepartially filtered pixels between the shared memory 140 and the workmemory is managed by a DMA (direct memory access) engine. The set ofparameters and the partially filtered pixels are provided from theshared memory 140 to the second video processing engine 104 forprocessing the second tile. In one version, the second video processingengine 104 access the shared memory 140 and use the set of parametersand the partially filtered pixels for processing of the second tile. Inanother version, the second video processing engine 104 access theshared memory 140 and transfers the set of parameters and the partiallyfiltered pixels from the shared memory 140 to the work memory 124.

Thus, the second video processing engine 104 is initiated after thefirst video processing engine 102 processes M rows of the first tile. Inone embodiment, M is equal to 1. Hence, the first video processingengine 102 processes 1 row of the first tile before the second videoprocessing engine 104 is initiated to process the second tile.Initiating the second video processing engine 104 includes activatingthe second video processing engine 104 on receiving the set ofparameters and the partially filtered pixels from the first videoprocessing engine 102.

In one version, a controller associated with the plurality of videoprocessing engines 101 initiates a video processing engine. Thecontroller activates the second video processing engine 104 and providesthe set of parameters and the partially filtered pixels obtained fromthe first video processing engine 102 to the second video processingengine 104. The second video processing engine 104 processes K rows ofthe second tile and generates a corresponding set of parameters and thepartially filtered pixels. The second video processing engine 104generates the set of parameters and partially filtered pixelscorresponding to a set of LCUs in the K rows of the second tile. In oneexample, the set of LCUs includes a last LCU in each of the K rows. Thelast LCU in each of the K rows share a boundary with an LCU in the thirdtile. The third video processing engine 106 is initiated on receivingthe set of parameters and the partially filtered pixels generated by thesecond video processing engine 104. The operation of video encoder 100is further illustrated in connection with FIG. 2. In one example, K isequal to M.

In another example embodiment, the FIG. 1 is a block diagram of a videodecoder. The video decoder is similar in connection to the video encoder100. However, the operation of a video decoder is inverse of theoperation of a video encoder. The video decoder receives a compressedbit-stream corresponding to a frame. The frame includes a plurality oftiles. The video decoder includes a plurality of video processingengines 101 communicatively coupled with each other. Each videoprocessing engine configured to receive a compressed bit-streamcorresponding to a tile of the plurality of tiles, wherein a height ofeach tile is equal to a height of the frame and a width of each tile isequal to a width of the frame divided by a number of video processingengines in the video decoder. The video decoder includes a first videoprocessing engine 102 and a second video processing engine 104 of theplurality of video processing engines 101. The second video processingengine 104 is initiated after the first video processing engine 102processes compressed bit-stream corresponding to M rows of a tile, whereM is an integer.

The first video processing engine 102 processes a compressed bit-streamcorresponding to a first tile and the second video processing engine 104processes a compressed bit-stream corresponding to a second tile. Thefirst tile and the second tile are adjacent tiles of the plurality oftiles. The first video processing engine 102 processes compressedbit-stream corresponding to M rows of the first tile and generate a setof parameters and partially filtered pixels corresponding to a set ofLCUs in the M rows of the first tile. The second video processing engine104 is initiated on receiving the set of parameters and partiallyfiltered pixels from the first video processing engine 102.

Each video processing engine is capable of performing encoding and/ordecoding at lower resolution for example at 1080P. However, the videoencoder 100 with the plurality of video processing engines 101 iscapable of performing encoding and/or decoding at higher resolution forexample at 4K resolution.

The video encoder 100 uses a loop filter in each of the video processingengines and hence the quality of the video processed in the videoencoder 100 is not degraded. Also, a controlled rate of encoding is notused in the video encoder 100 and hence the quality of the videoprocessed in the video encoder 100 is uniform. The video encoder 100also finds application in areas such as (but not limited to) videoconferencing, video surveillance and gaming etc. since frame latencydoes not arise in the video encoder 100. Also, no additional loopfilters are required in the video encoder 100 which were responsible forincreasing memory requirement of conventional video encoders. Thus, thevideo encoder 100 helps in achieving high performance to implementultra-HD (4K) video playback and record.

Although the present disclosure and its advantages have been describedwith respect to a video encoder, all the embodiments are similarlyapplicable to a video decoder.

FIG. 2 illustrates a frame 200 received in a video encoder, according toan embodiment. In one version, a video is received at the video encoderand the video contains a plurality of frames. The frame 200 is a frameof the plurality of frames. The frame 200 has a height H 202 and a widthW 204. The frame 200 is an 8×8 frame i.e. the frame 200 has 8 LCU(largest coding unit) in each row and 8 LCU in each column. Each of theLCU 1, LCU 2 till LCU 63 and LCU 64 represents an LCU in the frame 200.Each LCU further includes a plurality of pixels. The frame 200 isillustrated as an 8×8 frame to explain the logical flow and for ease ofunderstanding, and is understood not to limit the scope of the presentdisclosure.

The frame 200 is divided into a plurality of tiles such as, but notlimited to, tile 1, tile 2, tile 3 and tile 4. It is to be noted thatthe tiles illustrated in FIG. 2 are exemplary and the frame 200, inanother example, includes one or more tiles. A height of each tile ofthe plurality of tiles is equal to the height (H) 202 of the frame i.e.,each tile has a height H 202. The processing of the frame 200 isillustrated with the help of video encoder 100 illustrated in FIG. 1.

In one example, the frame 200 is divided into a plurality of tiles bythe video encoder 100. The video encoder 100 includes a plurality ofvideo processing engines 101. The plurality of video processing engines101 includes a first video processing engine 102, a second videoprocessing engine 104, a third video processing engine 106, a fourthvideo processing engine 108 and an N^(th) video processing engine 110. Awidth of each tile is equal to the width W 204 of the frame divided by anumber of video processing engines in the video encoder 100. In anexample, when the video encoder 100 has four video processing engines,the frame 200 is divided into four tiles each of height H 202 and widthW/4. In another example, a frame includes tiles of non-uniform widthsuch that a sum of the widths of the tiles is equal to the width of theframe W 204. Each tile includes a plurality of rows. For example, asillustrated in the figure, Row 1 to Row 8 represents the plurality ofrows in each tile. Each row of the plurality of rows includes aplurality of LCU. For example, Row 1 of tile 1 includes LCU 1 and LCU 2.Similarly, Row 2 of the tile 2 includes LCU 19 and LCU 20.

Each tile is allocated to a video processing engine in the video encoder100. For example, tile 1 is allocated to the first video processingengine 102, tile 2 is allocated to the second video processing engine104, tile 3 is allocated to the third video processing engine 106 andtile 4 is allocated to the fourth video processing engine 108. The firstvideo processing engine 102 processes M rows of tile 1, where M is aninteger. The processing of M rows generates a set of parameters andpartially filtered pixels corresponding to a set of LCUs in the M rowsof the tile 1. In one example, the set of LCUs includes a last LCU ineach of the M rows. The last LCU in each of the M rows share a boundarywith an LCU in the tile 2. In one version, the first video processingengine 102 processes 1 row of tile 1 for example, Row 1. LCU 2 is thelast tile in Row 1. Therefore, the first video processing engine 102generates a set of parameters and partially filtered pixelscorresponding to the LCU 2 in the tile 1. In another version, M is 2 andthe first video processing engine 102 processes 2 rows of tile 1, forexample Row 1 and Row 1. The processing of the LCU 2 and LCU 4 generatesa set of parameters and partially filtered pixels corresponding to theseLCUs.

The second video processing engine 104 is initiated to process tile 2after the first video processing engine 102 processes M rows to the tile1. In an example, the first video processing engine 102 processes Row 1of tile 1, and the set of parameters and partially filtered pixels thusgenerated are provided to the second video processing engine 104 forprocessing of tile 2. The second video processing engine 104 use the setof parameters and partially filtered pixels, received from the firstvideo processing engine 102, for processing LCU 17 and LCU 18.

Similarly, the first video processing engine 102 processes Row 2 of tile1 and the set of parameters and partially filtered pixels thus generatedare provided to the second video processing engine 104 for processingLCU 19 and LCU 20. The first video processing engine 102 processes tile1 in parallel to the second video processing engine 104 processing tile2. Thus, when the first video processing engine 102 is processing LCU 3and LCU 4, the second video processing engine 104 processes LCU 17 andLCU 18.

The second video processing engine 104 processes K rows of the tile 2and generates corresponding set of parameters and the partially filteredpixels. In one example, the second video processing engine 104 generatesthe set of parameters and partially filtered pixels corresponding to LCU18 in the Row 1 of the tile 2. The third video processing engine 106 isinitiated on receiving the set of parameters and the partially filteredpixels generated by the second video processing engine 104. The thirdvideo processing engine 106 processes Row 1 of tile 3 i.e. LCU 33 andLCU 34 on receiving the set of parameters and partially filtered pixelscorresponding to LCU 18 from the second video processing engine 104.

The first video processing engine 102 processes tile 1, the second videoprocessing engine 104 processes tile 2 and the third video processingengine processes tile 3 in parallel. Thus, when the first videoprocessing engine 102 is processing LCU 5 and LCU 6, the second videoprocessing engine 104 processes LCU 19 and LCU 20 and the third videoprocessing engine 106 processes LCU 33 and LCU 34. This is explained indetail in the following timing diagram of FIG. 3.

FIG. 3 illustrates a timing diagram 300 of a video encoder, according toan embodiment. The timing diagram 300 is explained using the frame 200(illustrated in FIG. 2). The video encoder 100 receives the frame 200which is divided into four tiles, tile 1, tile 2, tile 3 and tile 4.Each tile is processed by a video processing engine in the videoencoder. In an example, tile 1 is allocated to the first videoprocessing engine 102, tile 2 is allocated to the second videoprocessing engine 104, tile 3 is allocated to the third video processingengine 106 and tile 4 is allocated to the fourth video processing engine108. Tile 1 is adjacent to tile 2, tile 2 is adjacent to tile 3 and tile3 is adjacent to tile 4.

The first video processing engine 102 processes Row 1 of tile 1 andgenerates a set of parameters and partially filtered pixelscorresponding to a last LCU in the Row 1 of the tile 1 for example LCU2. The second video processing engine 104 is initiated to process Row 1of the tile 2 on receiving the set of parameters and partially filteredpixels from the first video processing engine 102. Similarly, the secondvideo processing engine 104 process Row 2 of tile 2 when it receives aset of parameters and partially filtered pixels corresponding to Row 2of tile 1 from the first video processing engine 102.

A state in which each video processing engines in the video encoder 100is initiated to process the respective tile is referred as pipe-up state302. In one example, when the video encoder 100 has four videoprocessing engines, a time from the initiation of the first videoprocessing engine 102 to a time of the initiation of the fourth videoprocessing engine 108 represents the pipe-up state 302. Thus, in pipe-upstate 302, when the first video processing engine 102 is processing Row3 of tile 1, the second video engine processes Row 2 of tile 2, thethird video processing engine processes Row 1 of tile 3 and the fourthvideo processing engine 108 is initiated to process tile 4. In a steadystate 304, all the video processing engines in the video encoder 100perform processing of respective allocated tiles in parallel. Asillustrated, in steady state 304, the first video processing engine 102processes Row 4 of tile 1, the second video processing engine 104processes Row 3 of tile 2, the third video processing engine 106processes Row 2 of tile 3 and the fourth video processing engine 108processes Row 1 of tile 4, in parallel.

A state in which each video processing engines in the video encoder 100processes a last row in the respective tile is represented as pipe-downstate 306. Thus, a time when the first video processing engine 102processes Row N of the tile 1 to the time when the fourth videoprocessing engine 108 processes Row N of the tile 4, represents thepipe-down state 306. Row N represents a last row in the respectivetiles. For example Row 8 in FIG. 3 represents a last row of tile 1, tile2, tile 3 and tile 4.

FIG. 4 illustrates a flowchart 400 of a method of video encoding,according to an embodiment. At step 402, a video is received thatincludes a plurality of frames. At step 404, each frame of the pluralityof frames is divided into a plurality of tiles. A height of each tile isequal to a height of the frame. A frame is divided into three tiles, afirst tile, a second tile and a third tile and height of each tile isequal to a height of the frame. The video encoder includes a first videoprocessing engine, a second video processing engine and a third videoprocessing engine. At step 406, the first tile, the second tile and thethird tile are allocated to the first video processing engine, thesecond video processing engine and the third video processing enginerespectively. At step 408, the first video processing engine processes Mrows of the first tile, where M is an integer. The processing of M rowsgenerates a set of parameters and partially filtered pixelscorresponding to a set of LCUs in the M rows of the first tile. In oneexample, the set of LCUs includes a last LCU in each of the M rows. Thelast LCU in each of the M rows share a boundary with an LCU in thesecond tile. The second video processing engine is initiated onreceiving the set of control parameters and the partially filteredpixels from the first video processing engine, at step 410.

At step 412, the second video processing engine processes K rows of thesecond tile, where K is an integer. The processing of K rows generates aset of parameters and partially filtered pixels corresponding to a setof LCUs in the K rows of the second tile. In one example, the set ofLCUs includes a last LCU in each of the K rows. The last LCU in each ofthe K rows share a boundary with an LCU in the third tile. The thirdvideo processing engine is initiated on receiving the set of controlparameters and the partially filtered pixels from the second videoprocessing engine, at step 414.

FIG. 5 illustrates a computing device 500 according to an embodiment.The computing device 500 is, or is incorporated into, a mobilecommunication device, such as a mobile phone, a personal digitalassistant, a transceiver, a personal computer, or any other type ofelectronic system. The computing device 500 may include one or moreadditional components known to those skilled in the relevant art and arenot discussed here for simplicity of the description.

In some embodiments, the computing device 500 comprises a megacell or asystem-on-chip (SoC) which includes a processing unit 512 such as a CPU(Central Processing Unit), a memory module 515 (e.g., random accessmemory (RAM)) and a tester 510. The processing unit 512 can be, forexample, a CISC-type (Complex Instruction Set Computer) CPU, RISC-typeCPU (Reduced Instruction Set Computer), or a digital signal processor(DSP). The memory module 515 (which can be memory such as RAM, flashmemory, or disk storage) stores one or more software applications 530(e.g., embedded applications) that, when executed by the processing unit512, performs any suitable function associated with the computing device500. The tester 510 comprises logic that supports testing and debuggingof the computing device 500 executing the software applications 530. Forexample, the tester 510 can be used to emulate a defective orunavailable component(s) of the computing device 500 to allowverification of how the component(s), were it actually present on thecomputing device 500, would perform in various situations (e.g., how thecomponent(s) would interact with the software applications 530). In thisway, the software applications 530 can be debugged in an environmentwhich resembles post-production operation.

The processing unit 512 typically comprises memory and logic which storeinformation frequently accessed from the memory module 515. A camera 518is coupled to the processing unit 512. The computing device 500 includesa video processing unit 516. The video processing unit 516 is coupled tothe processing unit 512 and the camera 518. The video processing unit516 includes a video encoder 520. The video encoder 520 is similar tothe video encoder 100 (illustrated in FIG. 1) in connection andoperation. The image/video data shot by the camera 518 is processed inthe video processing unit 516.

The video encoder 520 includes a plurality of video processing engines.The video data in the video encoder 520 is processed by dividing theframes in the video data into plurality of tiles and the height of eachtile is equal to the height of the frame. Each tile is allocated to avideo processing engine. The video encoder 520 uses a loop filter ineach of the video processing engines and hence the quality of the videoprocessed in the video encoder 100 is not degraded. The video encoder520 helps in achieving high performance to implement ultra-HD (4K) videoplayback and record As discussed earlier, a video decoder works on thesame principle of the video encoder 100 (illustrated in FIG. 1). Hence,the video encoder 520 in one embodiment is the video decoder.

FIG. 6 is an example environment in which various aspects of the presentdisclosure may be implemented. As shown, the environment may comprise,for example, one or more video cameras 610, computers 620, personaldigital assistants (PDA) 630, mobile devices 640, televisions 650, videoconference systems 660, video streaming systems 680, TV broadcastingsystems 670 and communication networks/channels 690.

The video cameras 610 are configured to take continuous pictures andgenerate digital video, a signal comprising sequence of image frames.The video cameras 610 are configured to process the image frames forefficient storage and/or for transmission over the communicationnetworks/channels 690. The computers 620, PDAs 630 and the mobiledevices 640 are configured to encode the video signals for transmissionand to decode encoded video signals received from the communicationnetworks/channels 690. The video streaming systems 680 is configured toencode video signal and to transmit the encoded video signals over thecommunication networks/channels 690 responsive to a received requestand/or asynchronously. The television broadcasting systems 670 areconfigured to process video signals in accordance with one or morebroadcast technologies and to broadcast the processed video signals overthe communication networks/channels 690. The video conference systems660 are configured to receive a video signal from one or moreparticipating/conferencing end-terminals (not shown) and to convert orcompress the video signal for broadcasting or for transmitting to otherparticipating user terminals. The television broadcasting systems 670are configured to receive encoded video signals from one or moredifferent broadcasting centers (or channels), to decode each videosignal and to display the decoded video signals on a display device (notshown).

As shown in FIG. 6, the devices and systems 610-680 are coupled tocommunication networks/channels 690. Communication networks/channels 690supports an exchange of video signal encoded in accordance with one ormore video encoding standards such as, but not limited to, H. 263, H.264/AEC, and HEVC (H. 266), for example. Accordingly, the devices andsystems 610-680 are required to process (encode and/or decode) videosignals complying with such standards. The systems and devices 610-680are implemented with one or more functional units that are configured toperform signal processing, transmitting and/or receiving of videosignals from communication networks/channels 690. When each device inthe described environment performs video encoding or decoding, one ormore embodiments described in this disclosure are used.

In the foregoing discussion, the terms “connected” means at least eithera direct electrical connection between the devices connected or anindirect connection through one or more passive intermediary devices.The term “circuit” means at least either a single component or amultiplicity of passive or active components, that are connectedtogether to provide a desired function. The term “signal” means at leastone current, voltage, charge, data, or other signal. Also, the terms“connected to” or “connected with” (and the like) are intended todescribe either an indirect or direct electrical connection. Thus, if afirst device is coupled to a second device, that connection can bethrough a direct electrical connection, or through an indirectelectrical connection via other devices and connections. The terms“inactivation” or “inactivated” or turn “OFF” or turned “OFF” is used todescribe a deactivation of a device, a component or a signal. The terms“activation” or “activated” or turned “ON” describes activation of adevice, a component or a signal.

Modifications are possible in the described embodiments, and otherembodiments are possible, within the scope of the claims.

What is claimed is:
 1. A video encoder configured to receive a frame andconfigured to divide the frame into a plurality of tiles, the videoencoder comprising: a plurality of video processing enginescommunicatively coupled with each other, each video processing engineconfigured to receive a tile of the plurality of tiles, wherein a heightof each tile is equal to a height of the frame, and each tile comprisesa plurality of rows; and a first and a second video processing engine ofthe plurality of video processing engines, the second video processingengine being initiated after the first video processing enginesprocesses M rows of the plurality of rows of the tile, where M is aninteger.
 2. The video encoder of claim 1, wherein a width of each tileis equal to a width of the frame divided by a number of video processingengines of the plurality of video processing engines.
 3. The videoencoder of claim 1, wherein in a steady state of the video encoder, thefirst video processing engine is configured to process a first tile andthe second video processing engine processes a second tile in parallel,the first tile and the second tile are adjacent tiles of the pluralityof tiles.
 4. The video encoder of claim 3, wherein the first videoprocessing engine is configured to process M rows of the first tilebefore the second video processing engine is initiated to process thesecond tile.
 5. The video encoder of claim 1, wherein each row of theplurality of rows comprises a plurality of LCUs (largest coding units)and each LCU comprises a plurality of pixels.
 6. The video encoder ofclaim 1, wherein each video processing engine comprises a loop filter,the loop filter includes a work memory configured to store a set ofparameters and partially filtered pixels corresponding to a set of LCUs.7. The video encoder of claim 1 further comprising a shared memorycoupled to the plurality of video processing engines.
 8. The videoencoder of claim 1, wherein the work memory in each video processingengine is coupled to the shared memory.
 9. The video encoder of claim 1,wherein: the first video processing engine, on processing M rows of thefirst tile, generates a set of parameters and partially filtered pixelscorresponding to a set of LCUs in the M rows of the first tile; the setof parameters and the partially filtered pixels are stored in a workmemory associated with the first video processing engine; the set ofparameters and the partially filtered pixels are provided from the workmemory to the shared memory; and the set of parameters and the partiallyfiltered pixels are provided from the shared memory to the second videoprocessing engine for processing the second tile.
 10. A method of videoencoding comprising: receiving a plurality of frames; dividing eachframe of the plurality of frames into a plurality of tiles such that aframe comprises a plurality of tiles and a height of each tile is equalto a height of the frame, and wherein each tile comprises a plurality ofrows; and initiating a second video processing engine to process asecond tile, after the first video processing engine processes M rows ofa first tile, M is an integer, and wherein the first tile and the secondtile are adjacent tiles of the plurality of tiles.
 11. The method ofclaim 10 further comprising processing the first tile by the first videoprocessing engine and the second tile by the second processing engine inparallel during a steady state.
 12. The method of claim 10, whereinprocessing M rows of the first tile in the first video processing enginegenerates a set of parameters and partially filtered pixelscorresponding to a set of LCUs (largest coding units) in the M rows ofthe first tile, wherein each row comprises a plurality of LCUs.
 13. Themethod of claim 10, wherein initiating the second video processingengine further comprises activating the second video processing engineon receiving the set of parameters and the partially filtered pixelsfrom the first video processing engine.
 14. The method of claim 10,wherein a width of each tile is equal to a width of the frame divided bya number of video processing engines in the plurality of videoprocessing engines.
 15. The method of claim 10 further comprisinginitiating a third video processing engine to process a third tile,after the second video processing engine processes K rows of the secondtile, K is an integer, and wherein the third tile is adjacent to thesecond tile.
 16. A video decoder configured to receive a compressedbit-stream corresponding to a frame, the frame includes a plurality oftiles, the video decoder comprising: a plurality of video processingengines communicatively coupled with each other, each video processingengine configured to receive a compressed bit-stream corresponding to atile of the plurality of tiles, wherein each tile comprises a pluralityof rows, and wherein, a height of each tile is equal to a height of theframe and a width of each tile is equal to a width of the frame dividedby a number of video processing engines in the video decoder; and afirst and a second video processing engine of the plurality of videoprocessing engines, the second video processing engine being initiatedafter the first video processing engines processes compressed bit-streamcorresponding to M rows of the plurality of rows of the tile, where M isan integer.
 17. The video decoder of claim 16, wherein: the first videoprocessing engine is configured to process a compressed bit-streamcorresponding to a first tile and the second video processing engine isconfigured to process a compressed bit-stream corresponding to a secondtile, the first tile and the second tile are adjacent tiles of theplurality of tiles; the first video processing engine is configured toprocess compressed bit-stream corresponding to M rows of the first tileand configured to generate a set of parameters and partially filteredpixels corresponding to a set of LCUs in the M rows of the first tile;and the second video processing engine is initiated on receiving the setof parameters and partially filtered pixels from the first videoprocessing engine.
 18. A computing device comprising: a processing unit;a memory module coupled to the processing unit; and a video encodercoupled to the processing unit and the memory module, the video encoderconfigured to receive a frame and configured to divide the frame into aplurality of tiles, the video encoder comprising: a plurality of videoprocessing engines communicatively coupled with each other, each videoprocessing engine configured to receive a tile of the plurality oftiles, wherein a height of each tile is equal to a height of the frame,and each tile comprises a plurality of rows; and a first and a secondvideo processing engine of the plurality of video processing engines,the second video processing engine being initiated after the first videoprocessing engines processes M rows of the plurality of rows of thetile, where M is an integer.
 19. The computing device of claim 18,wherein processing M rows of the first tile in the first videoprocessing engine generates a set of parameters and partially filteredpixels corresponding to a set of LCUs (largest coding unit) in the Mrows of the first tile, wherein each row comprises a plurality of LCUs.20. The computing device of claim 18, wherein the second videoprocessing engine is initiated on receiving the set of parameters andthe partially filtered pixels from the first video processing engine.